Skip to content

Intra-process progress exchange medium#807

Draft
frankmcsherry wants to merge 6 commits into
masterfrom
progress_chain
Draft

Intra-process progress exchange medium#807
frankmcsherry wants to merge 6 commits into
masterfrom
progress_chain

Conversation

@frankmcsherry

Copy link
Copy Markdown
Member

An experimental but not yet better progress tracking medium for multiple worker threads. The design is a concurrent data structure that supports more in-place aggregation, and is designed to favor laggards by having the folks committing updates perform consolidation as they do, leaving behind less of a mess than MPSCs do.

frankmcsherry and others added 6 commits June 12, 2026 16:57
…ll-reduce

A Chain<T> is a single-writer, multi-reader chain of atoms intended to
eventually replace the Progcaster's intra-process leg; a Mesh<T> bundles
one chain per writer, with readers that sweep all chains.

Atoms merge through T: Chainable, a commutative monoid, and may be
compacted (merged with adjacent atoms, never split) before a reader
folds them. Each reader folds every atom sent after its registration
exactly once; live state is bounded by O(#readers) independent of the
send count.

Design highlights:
- Forward links; node payload (value, next) behind one RwLock so walkers
  snapshot consistently and compaction mutates atomically; per-node
  atomic holders counts with RAII pins.
- Writer fast path merges in place into an unpinned newest node (zero
  allocation in the common case), re-checking holders under the payload
  write lock to exclude concurrent pinning.
- Readers walk pin..=newest hand-over-hand, folding forward; recv_with
  hands individual atoms to the caller.
- Compaction absorbs a successor only when both nodes are unpinned and
  the successor is not newest; bypassing pinned nodes is unsound (their
  frozen next pointers are side entrances that later absorptions would
  move values behind). Boundedness instead comes from an oldest pointer
  and a full sweep at every recv and reader drop; the prefix behind all
  pins is reclaimed by Arc refcounts.
- Lock order: oldest mutex, then newest mutex, then payload locks older
  before newer.

Tests cover single/multiple/late readers, repeated and empty recvs,
laggard compaction bounds, reader-drop healing, the in-place fast path
(chain length exactly 2 after catch-up plus N sends), mesh delivery,
and a randomized multi-threaded stress test asserting totals and
per-chain length <= #readers + 2 at quiescent checkpoints.

See chain-design-notes.md for the contract, the compaction safety
argument, and deviations from the original sketch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per-writer chains fail the primary goal of cross-writer cancellation:
two workers sending {(T,+1)} and {(T,-1)} at distinct increasing T build
internally-incompressible per-chain content whose union cancels to
nothing, leaving laggards O(elapsed) fold work. A single shared chain
merges concurrent sends at the head, so accumulated nodes hold the net.

- Chain<T> is now a cloneable multi-writer handle; send(&self, value)
  locks the newest mutex (serializing writers), then the newest node's
  payload write lock (consistent with the documented lock ordering), and
  merges in place when holders == 0, allocating only when a reader has
  just caught up and pinned the head.
- Mesh<T> retained as a per-writer comparison structure, documented as
  benchmark-only with its known cancellation pathology.
- Tests: suite adapted to the new API; added multi-writer tests
  (sequential two-handle, cross-writer cancellation state bound,
  concurrent writers with one and many readers); stress test now shares
  one chain across writer threads. 15 chain tests pass; clippy clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three scenarios (all keep up; laggard; unread backlog) across N workers
and cancellation fractions, with results and analysis in
chain-bench-results.md. Headline: the chain bounds backlog state and
laggard work by the live cancellation window (orders of magnitude below
the MPSC baseline, flat in N), and pays 2-7x on the tight-loop send path
where the head mutex contends.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A per-send version counter enables cheap change detection: recv returns
without taking any chain lock when no atom has been committed since the
reader last caught up, and the compaction sweep runs only after a
productive walk (when pins actually moved). This keeps spinning readers
off the chain's shared locks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ing-key axis

Two more points on the merge-eagerness spectrum (ledger: total merge via
a shared consolidated map; cells: per-reader in-place accumulators) and a
key-type axis (u64 vs Box<[u64;3]>) modeling allocating timestamps, whose
clone/drop traffic is the scaling pathology the chain targets.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Synthetic three-scenario sweep plus an allocating-key matrix and a record
of real-workload measurements from a separate experimental Progcaster
wiring (not in this branch). Bottom line: the chain wins laggard work and
backlog memory decisively, loses single-socket throughput, and allocation
narrows but does not close that gap at 10 cores.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant